Goto

Collaborating Authors

 generic algorithm


Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

Neural Information Processing Systems

We study distributionally robust offline reinforcement learning (RL), which seeks to find an optimal robust policy purely from an offline dataset that can perform well in perturbed environments. We propose a generic algorithm framework Doubly Pessimistic Model-based Policy Optimization ($\texttt{P}^2\texttt{MPO}$) for robust offline RL, which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. Here the double pessimism principle is crucial to overcome the distribution shift incurred by i) the mismatch between behavior policy and the family of target policies; and ii) the perturbation of the nominal model. Under certain accuracy assumptions on the model estimation subroutine, we show that $\texttt{P}^2\texttt{MPO}$ is provably sample-efficient with robust partial coverage data, which means that the offline dataset has good coverage of the distributions induced by the optimal robust policy and perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples including tabular Robust Markov Decision Process (RMDP), factored RMDP, and RMDP with kernel and neural function approximations, we show that $\texttt{P}^2\texttt{MPO}$ enjoys a $\tilde{\mathcal{O}}(n^{-1/2})$ convergence rate, where $n$ is the number of trajectories in the offline dataset. Notably, these models, except for the tabular case, are first identified and proven tractable by this paper. To the best of our knowledge, we first propose a general learning principle --- double pessimism --- for robust offline RL and show that it is provably efficient in the context of general function approximations.



Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage

Neural Information Processing Systems

We study distributionally robust offline reinforcement learning (RL), which seeks to find an optimal robust policy purely from an offline dataset that can perform well in perturbed environments. We propose a generic algorithm framework Doubly Pessimistic Model-based Policy Optimization ( \texttt{P} 2\texttt{MPO}) for robust offline RL, which features a novel combination of a flexible model estimation subroutine and a doubly pessimistic policy optimization step. Here the double pessimism principle is crucial to overcome the distribution shift incurred by i) the mismatch between behavior policy and the family of target policies; and ii) the perturbation of the nominal model. Under certain accuracy assumptions on the model estimation subroutine, we show that \texttt{P} 2\texttt{MPO} is provably sample-efficient with robust partial coverage data, which means that the offline dataset has good coverage of the distributions induced by the optimal robust policy and perturbed models around the nominal model. By tailoring specific model estimation subroutines for concrete examples including tabular Robust Markov Decision Process (RMDP), factored RMDP, and RMDP with kernel and neural function approximations, we show that \texttt{P} 2\texttt{MPO} enjoys a \tilde{\mathcal{O}}(n {-1/2}) convergence rate, where n is the number of trajectories in the offline dataset.


Adaptive Neural Compilation

Neural Information Processing Systems

This paper proposes an adaptive neural-compilation framework to address the problem of learning efficient programs. Traditional code optimisation strategies used in compilers are based on applying pre-specified set of transformations that make the code faster to execute without changing its semantics. In contrast, our work involves adapting programs to make them more efficient while considering correctness only on a target input distribution. Our approach is inspired by the recent works on differentiable representations of programs. We show that it is possible to compile programs written in a low-level language to a differentiable representation. We also show how programs in this representation can be optimised to make them efficient on a target input distribution. Experimental results demonstrate that our approach enables learning specifically-tuned algorithms for given data distributions with a high success rate.


A Generic Algorithm for Top-K On-Shelf Utility Mining

Chen, Jiahui, Guo, Xu, Gan, Wensheng, Wan, Shichen, Yu, Philip S.

arXiv.org Artificial Intelligence

On-shelf utility mining (OSUM) is an emerging research direction in data mining. It aims to discover itemsets that have high relative utility in their selling time period. Compared with traditional utility mining, OSUM can find more practical and meaningful patterns in real-life applications. However, there is a major drawback to traditional OSUM. For normal users, it is hard to define a minimum threshold minutil for mining the right amount of on-shelf high utility itemsets. On one hand, if the threshold is set too high, the number of patterns would not be enough. On the other hand, if the threshold is set too low, too many patterns will be discovered and cause an unnecessary waste of time and memory consumption. To address this issue, the user usually directly specifies a parameter k, where only the top-k high relative utility itemsets would be considered. Therefore, in this paper, we propose a generic algorithm named TOIT for mining Top-k On-shelf hIgh-utility paTterns to solve this problem. TOIT applies a novel strategy to raise the minutil based on the on-shelf datasets. Besides, two novel upper-bound strategies named subtree utility and local utility are applied to prune the search space. By adopting the strategies mentioned above, the TOIT algorithm can narrow the search space as early as possible, improve the mining efficiency, and reduce the memory consumption, so it can obtain better performance than other algorithms. A series of experiments have been conducted on real datasets with different styles to compare the effects with the state-of-the-art KOSHU algorithm. The experimental results showed that TOIT outperforms KOSHU in both running time and memory consumption.


Machine Learning is Fun

#artificialintelligence

Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data. For example, one kind of algorithm is a classification algorithm. It can put data into different groups. The same classification algorithm used to recognize handwritten numbers could also be used to classify emails into spam and not-spam without changing a line of code. It's the same algorithm but it's fed different training data so it comes up with different classification logic.


Machine Learning is Fun! Part 2 – Adam Geitgey – Medium

#artificialintelligence

Update: This article is part of a series. Check out the full series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8! In Part 1, we said that Machine Learning is using generic algorithms to tell you something interesting about your data without writing any code specific to the problem you are solving. This time, we are going to see one of these generic algorithms do something really cool -- create video game levels that look like they were made by humans. We'll build a neural network, feed it existing Super Mario levels and watch new ones pop out! Just like Part 1, this guide is for anyone who is curious about machine learning but has no idea where to start.


Machine Learning is Fun! Part 2 – Adam Geitgey – Medium

#artificialintelligence

Update: This article is part of a series. Check out the full series: Part 1, Part 2, Part 3, Part 4, Part 5, Part 6, Part 7 and Part 8! In Part 1, we said that Machine Learning is using generic algorithms to tell you something interesting about your data without writing any code specific to the problem you are solving. This time, we are going to see one of these generic algorithms do something really cool -- create video game levels that look like they were made by humans. We'll build a neural network, feed it existing Super Mario levels and watch new ones pop out! Just like Part 1, this guide is for anyone who is curious about machine learning but has no idea where to start.


Machine Learning is Fun! – Adam Geitgey – Medium

#artificialintelligence

Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data. For example, one kind of algorithm is a classification algorithm. It can put data into different groups. The same classification algorithm used to recognize handwritten numbers could also be used to classify emails into spam and not-spam without changing a line of code.


Machine Learning is Fun! – Adam Geitgey – Medium

#artificialintelligence

Machine learning is the idea that there are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data. For example, one kind of algorithm is a classification algorithm. It can put data into different groups. The same classification algorithm used to recognize handwritten numbers could also be used to classify emails into spam and not-spam without changing a line of code.